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FOREWORD 


The  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 
(ARI)  maintains  a  field  unit  with  the  U.S.  Army,  Europe  (USAREUR)  to 
conduct  research  to  meet  the  special  needs  of  USAREUR  and  to  evaluate 
other  research  projects  and  products  under  front-line  operational 
readiness  requirements,  with  feedback  leading  to  modification  and 
refinements. 

Sustainment  training  is  receiving  increasing  emphasis  by  the  Army  as  the 
only  viable  approach  to  maintenance  of  performance  of  critical  combat  unit 
tasks.  One  of  the  requirements  of  the  sustainment  approach  to  training 
management  is  a  detailed  knowledge  of  both  the  rates  of  degradation  of  per¬ 
formance  over  time  and  the  time  required  to  train  an  individual  on  critical 
tasks.  This  information  is  ultimately  related  to  some  measure  of  task 
difficulty. 

This  report  evaluates  the  possibility  of  using  expert  ratings  of  task 
difficulty  to  provide  the  data  base  on  task  difficulty  required  by  the  sus¬ 
tainment  model.  The  research  was  conducted  under  Army  Project  2Q762722A764, 
"Training  and  Education". 


A  COMPARISON  OP  EXPERT  RATINGS  OF  TASK  DIFFICULTY  WITH  AN  INDEPENDENT  CRITERION 


BRIEF 


Requirement: 

To  evaluate  the  possibility  of  using  "expert"  ratings  of  task  difficulty 
to  assess  actual  task  difficulty. 


Procedure : 

Expert  ratings  of  task  difficulty,  obtained  from  squad  leaders  and  pla¬ 
toon  leaders,  were  compared  with  Skill  Qualification  Test  (SQT)  results  for 
soldiers  at  Skill  Level  One  (SQT  2)  and  Skill  Level  Three  (SQT  4).  Additionally, 
SQT  2  results  were  compared  with  SQT  4  results  as  well  as  SQT  2  results  for 
the  SQT  given  five  quarters  later.  The  percent  of  soldiers  missing  the 
written  component  of  a  particular  SQT  task  was  used  as  the  independent  mea¬ 
sure  of  task  difficulty. 


Findings : 

There  was  no  significant  correlation  between  the  expert  ratings  of 
difficulty  and  difficulty  as  indicated  by  SQT  results.  There  was  a  high 
correlation  between  SQT  2  and  SQT  4  results  and  between  results  for  suc¬ 
cessive  administrations  of  the  SQT.  The  conclusion  was  that  expert  ratings 
of  difficulty  may  not  be  representative  of  actual  task  difficulty. 


Utilization  of  Findings: 

The  findings  suggest  that  it  may  be  necessary  to  investigate  methods 
of  improving  expert  ratings  of  task  difficulty  in  order  to  derive  ratings 
representative  of  actual  task  difficulty. 
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A  COMPARISON  OF  EXPERT  RATINGS  OF 


TASK  DIFFICULTY  WITH  AN  INDEPENDENT  CRITERION 


INTRODUCTION 

Training  management  in  the  US  Army  requires  the  training  manager  to 
possess  detailed  information  regarding  individual  tasks  for  planning  and 
scheduling  purposes.  This  information  must  include  task  difficulty,  which 
serves  as  a  guide  for  setting  priorities  when  planning  training  and  as 
an  indicator  of  the  training  time  needed  when  scheduling  training.  The 
most  common  method  of  determining  task  difficulty  is  to  ask  for  "expert" 
opinion  regarding  the  difficulty  of  training  an  individual  on  a  particular 
individual  task.  The  expert  is  usually  someone,  such  as  a  first  line 
supervisor,  who  is  thought  to  have  an  insight  into  the  training  process. 

In  the  absence  of  empirical  information,  these  everts  must  be  con¬ 
sidered  to  be  making  judgements  about  task  difficulty  under  conditions 
of  uncertainty.  This  may  lead  to  systematic  errors  of  judgements  result¬ 
ing  from  erroneous  intuitions  about  the  nature  of  the  factor  being  judged. 
This  means  that  the  judges  may  have  personal  conceptions  about  what  con¬ 
stitutes  a  difficult  task,  which  may  or  may  not  be  representative  of  actual 
difficulty  (Kahneman  and  Tversky,  1977;  Tversky  and  Kahneman,  1977). 

Expert  opinion  is  rarely  tested  against  an  unrelated  criterion  measure  of 
the  factor  being  judged.  This  must  be  done  to  insure  that  there  is  an 
empirical  foundation  for  such  judgements  of  task  difficulty. 


PURPOSE 

The  research  reported  here  was  conducted  to  evaluate  expert  ratings 
of  task  difficulty.  Ratings  of  task  difficulty  represent  a  potential 
data  base  for  developing  predictive  models  of  retention  for  specific 
tasks.  Such  a  model  is  needed  if  a  sustainment  approach  to  training  is 
to  be  eventually  adopted  by  Army  battalions.  The  sustainment  approach 
requires  that  the  training  manager  have  access  to  the  degradation  rates 
for  all  tasks  on  which  the  battalion  must  train.  From  this  information 
the  training  manager  can  determine  the  training  frequency  on  each  task 
required  to  produce  an  acceptable  level  of  sustainment. 

Ratings  of  task  difficulty,  obtained  from  squad  leaders  and  platoon 
leaders,  were  compared  with  Skill  Qualification  Test  (SQT)  results  for 
soldiers  in  grades  E1-E4.  The  SQT  is  a  performance-based  measure  of  job 
proficiency  consisting  of  a  number  of  tests  of  tasks  which  are  constructed 
using  behaviorally  derived  scoring  standards.  The  SQT  may  have  a  hands-on 
component,  a  performance  certification  component,  and  a  written  component. 
The  written  component  consists  of  a  number  of  tests  of  tasks,  each  repre¬ 
sented  by  a  set  of  items  designed  to  measure  essential  behaviors  or  steps 
in  performing  the  task.  The  exact  nature  of  the  SQT  varies  with  the  res¬ 
ponsibilities  of  the  military  grade,  and  thus  skill  level,  of  the  subpopu¬ 
lation  being  tested.  Research  has  shown  that  performance  on  written  tests 
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correlates  highly  with  the  level  of  performance  on  actual  performance  tests 
if  the  written  tests  are  criterion  referenced  to  the  same  set  of  criteria 
as  the  actual  performance  test  (Osborn  and  Ford,  1978) .  The  SQT  is  criterion 
referenced  and  each  task  on  the  written  component  is  validated  against  actual 
performance  on  the  task  (Maier  and  Hershfield,  1978;  Osborn  et  al.,  1977). 

The  results  of  the  written  component  of  the  SQT  should  relate  to  task 
difficulty,  with  more  difficult  tasks  being  missed  more  frequently  than  less 
difficult  tasks.  This  is  indirectly  implied  in  that  the  percent  of  individ¬ 
uals  missing  a  question  on  the  written  component  of  the  SQT  is  a  direct 
measure  of  item  difficulty  of  the  particular  question  (Steinheiser  et.  al, 
1978) .  If  the  item  is  validated  as  discriminating  between  performers  and 
non-performers,  then  the  item  difficulty  is  related  to  task  difficulty, 
with  performance  on  a  set  of  items  being  a  relative  measure  of  the  actual 
difficulty  of  any  particular  task.  The  assertion  that  SQT  results  are 
representative  of  difficulty  might  not  hold  if  soldiers  were  trained  to  a 
high  level  of  performance  on  difficult  tasks  and  a  low  level  of  performance 
on  easy  tasks.  This  would  normally  not  be  the  case,  since  one  trial  per¬ 
formance  is  the  US  Army  criterion.  Additionally,  there  is  such  a  variety 
of  training  methods  and  training  priorities  within  the  US  Army  that  the 
likelihood  of  a  systematic  bias  in  training  would  be  small.  This  would  mean 
that  in  the  best  case  all  tasks  would  be  trained  to  criterion,  and  in  the 
worst  case  there  would  be  random  training  of  tasks  to  criterion  (Yates,  1979). 
With  this  in  mind,  the  written  component  of  SQT2  was  selected  as  the  indepen¬ 
dent  criterion  against  which  the  expert  ratings  of  task  difficulty  were 
evaluated.  SQT2  was  selected  because  of  the  large  number  of  soldiers  and 
organizations  represented,  limiting  the  introduction  of  a  systematic  training 
bias.  Also,  SQT2  is  for  low  ranking  soldiers  who  have  been  in  the  service 
for  a  relatively  short  period  of  time.  This  helps  to  insure  that  over¬ 
learning  on  any  set  of  tasks  has  not  occurred,  and  that  most  soldiers  do  not 
pass  most  tasks. 

The  real  advantages  of  using  ratings  of  difficulty  as  opposed  to  act¬ 
ually  measuring  difficulty  or  obtaining  SQT  results  are  numerous.  There  are 
hundreds  of  tasks  in  the  US  Army  inventory  of  tasks  and  the  logistics  of 
collecting  data  on  each  task  is  enormous.  Obtaining  ratings  from  experts 
is  a  relatively  simple  and  low-cost  process.  The  SQT  results  could  provide 
part  of  the  answer,  but  there  are  no  SQTs  yet  for  many  Military  Occupational 
Specialities  (MOSs) .  Additionally,  many  of  the  tasks  for  any  particular  MOS 
having  an  SQT  are  never  tested,  leaving  the  data  base  incomplete  for  the  MOS. 


METHOD 

Judgements  were  obtained,  during  a  previous  research  effort,  from 
sixty-eight  (68)  randomly  selected  squad  leaders  and  platoon  leaders  in 
mechanized  infantry  units  regarding  the  difficulty  of  eighteen  (18)  Skill 
Level  One  (SL1)  individual  tasks  for  E1-E4  soldiers  in  MOS  11B.  The  judges 
were  asked  to  rate  the  difficulty  of  each  task  in  terms  of  easy,  moderately 
difficult  and  extremely  difficult  (Bonner,  1978).  These  data  were  transposed 
into  a  form  more  amenable  to  analysis.  Using  the  values  of  easy=0,  moderately 
difficult=50,  and  extremely  difficult=100,  the  rated  task  difficulty  was 
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converted  into  a  scale  of  0-100,  with  the  category  frequency  serving  as 
the  weight  for  each  value. 

Each  major  command  receives  a  quarterly  SQT  report  listing  the  percent 
of  men  failing  to  pass  each  task  on  the  SQT.  This  SQT  report  was  obtained 
and  data  extracted  for  the  same  eighteen  (18)  tasks  of  interest  for  MOS  11B 
soldiers  in  grades  E1-E4,  and  from  the  same  population  of  mechanized  infantry 
units  as  the  squad  leaders  and  platoon  leaders.  For  each  of  the  individual 
tasks  of  interest,  the  percent  of  men  missing  the  written  component  task 
was  taken  as  a  direct  measure  of  task  difficulty.  The  assumption  of  this 
report  is  that  the  less  the  task  was  missed,  the  less  difficult  the  task. 

The  SQT  data  were  derived  from  the  written  component  results  for  2,003 
El-E4s.  Table  1  shows  the  category  frequencies  and  weighted  difficulty 
for  the  expert  judgements,  and  the  percent  missing  for  the  SQT  data. 


TABLE  1 

Percent  of  Soldiers  Missing  SQT2  Task,  Percent  of  Judgements  in 
Each  Category,  and  Weighted  Difficulty 


Task 

SQT2  * 
%Soldiers 
Missing 

Easy 

JUDGEMENTS** 

Moderately 

Difficult 

Ve  ry 

Dif  ficult 

Weighted 

Difficulty 

071-11A-0001 

51 

66 

30 

4 

19 

071-11A-0150 

41 

66 

28 

6 

20 

0  71- 1 1A-0  5  0  2 

26 

49 

41 

10 

31 

0  7 1- 11A- 0  5 11 

42 

66 

24 

10 

22 

0  71- 1 1 A- 0  7  04 

25 

38 

43 

19 

41 

07 1- 1 1A- 0705 

63 

44 

43 

13 

35 

0  71- 11 A- 0  7  0  3 

47 

51 

40 

9 

29 

071-1 1A- 0  80 1 

49 

81 

13 

6 

13 

071-1 1A-0960 

53 

54 

28 

18 

32 

071- 11A- 2  00  3 

52 

82 

18 

0 

9 

071- 11A-200  4 

52 

63 

33 

4 

21 

071-1 IB-2006 

30 

69 

28 

3 

17 

071-1 1A- 2104 

46 

50 

43 

7 

29 

071-1 1A- 2401 

58 

57 

33 

10 

27 

071-1 1A-1501 

73 

65 

29 

6 

21 

071-11A-4402 

69 

71 

23 

6 

18 

071-1 1A-4502 

37 

65 

28 

7 

21 

071-1 1A- 4503 

25 

37 

40 

23 

43 

*N  =  2 ,003 
*  *  N=6  8 

r=  -.375,  df=16 ,  p  >  . 10 
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Since  it  is  possible  that  the  judges  rated  task  difficulty  in  terms  of 
their  own  ability  and  personal  experience  with  the  task,  SQT  written  com¬ 
ponent  results  were  obtained  for  SQT4  (N=349) .  This  is  the  SQT  that  a 
squad  leader  would  be  ejected  to  take.  Although  these  data  are  for  squad 
leaders,  similar  results  could  be  expected  from  platoon  leaders  since  they 
have  a  similar  degree  of  competence  on  the  tasks  as  squad  leaders.  This 
similar  competence  is  due  to  the  nature  of  the  branch  training  platoon 
leaders  receive  as  they  complete  their  basic  officer  training.  Therefore, 
SQT4  results  should  be  representative  of  task  difficulty  for  both  squad 
leaders  and  platoon  leaders.  There  were  nine  (9)  written  component  tasks 
in  common  between  SQT2  and  SQT4,  and  seven  (7)  written  component  tasks 
in  common  between  SQT4  and  the  difficulty  ratings.  These  tasks,  with 
the  percent  of  soldiers  missing  the  tasks,  and  the  weighted  difficulty, 
are  shown  in  Table  2. 


TABLE  2 


Percent  of  Soldiers  Missing  Task  on  SQT2  and  SQT4,  and  Weighted  Difficulty 


Task 

SQT2* 

%  Soldiers 
Missing 

SQT4* 

%  Soldiers 
Missing 

Weighted  *** 
Difficulty 

071-11A-0001 

51 

41 

19 

071-11A-0705 

62 

47 

35 

071-11A-0960 

53 

51 

32 

071-11A-2003 

52 

52 

9 

071-11A-2104 

46 

38 

29 

071-11A-2304 

88 

77 

no  data 

071-11A-2401 

58 

30 

27 

071-11A-1501 

73 

57 

21 

071-11A-4505 

74 

51 

no  data 

*N=2 ,003 

**N=349 

***N=68 

r=+.774,  df=7,  p  <  .02  (between  SQT2  and  SQT4) 

r=-.264,  df=5,  p  >  .10  (between  SQT4  and  Weighted  Difficulty) 
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There  is  also  a  possibility  that  SQT  results  are  not  stable  in  terms 
of  relative  task  difficulty.  This  could  result  from  differences  in  the 
nature  or  wording  of  particular  questions  on  successive  forms  of  the 
written  component  of  the  SQT.  As  a  check  on  this  stability,  the  SQT2 
results  (SQT2A)  which  were  correlated  with  the  ratings  of  task  difficulty 
were  compared  with  the  SQT2  results  for  the  SQT  (SQT2B)  given  to  the  same 
population  of  mechanized  infantry  battalions  five  quarters  later.  The 
fourteen  (14)  tasks  in  common  between  the  two  SQTs  and  the  percent  of 
soldiers  missing  the  tasks  are  shown  in  Table  3. 

TABLE  3 


Percent  of  Soldiers  Missing  Task  on  Successive  Administrations  of  SQT2 


TASK 

SQT2A 
%  Soldiers 
Missing 

SQT2B* ** 

%  Soldiers 

Missing 

071-11A-0150 

41 

34 

071-11A-0502 

26 

25 

071-11A-0511 

42 

36 

071-11A-0704 

25 

29 

071-11A-0705 

63 

60 

071-11A-0703 

47 

41 

071-11A-0801 

49 

48 

071-11A-2003 

52 

49 

071-11A-2004 

52 

75 

071-11B-2006 

30 

33 

071-11A-2104 

46 

52 

071-11A-2401 

58 

48 

071-11A-1501 

73 

69 

071-11A-4503 

25 

29 

*N=2,003 

**N=1, 337 

r=+.854,  df=12,  p  <  .001 
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RESULTS 


The  percent  of  soldiers  missing  the  task  on  SQT2  and  the  weighted 
task  difficulty,  for  each  of  the  eighteen  (18)  tasks  in  Table  1,  were 
correlated  by  the  Pearson  method.  The  correlation  was  found  to  be 
non-significant  (r=-.375,  df=16,  p>.10). 

The  percent  of  soldiers  missing  a  task  on  SQT2  was  correlated  by  the 
same  method  with  the  percent  of  soldiers  missing  the  same  task  on  SQT4 
for  each  of  the  nine  (9)  tasks  in  Table  2.  The  correlation  was  significant 
(r=+.774,  df=7,  p<  .02). 

The  percent  of  soldiers  missing  an  SQT4  was  correlated  by  the  same 
method  with  the  weighted  task  difficulty  for  each  of  the  seven  (7)  tasks 
in  Table  2.  The  correlation  was  nonsignificant  (r=-.264,  df=5,  p>.10). 

The  percent  of  soldiers  missing  a  task  on  the  earlier  SQT2  (SQT2A) 
was  correlated  by  the  same  method  with  the  percent  of  soldiers  missing 
the  same  task  on  the  later  SQT2  (SQT2B)  for  the  fourteen  (14)  tasks  in 
Table  3.  The  correlation  was  highly  significant  (r=+.854,  df=12,  p<.001). 


DISCUSSION 


The  results  indicate  that  the  expert  ratings  of  task  difficulty  may 
not  be  representative  of  the  actual  task  difficulty.  The  low  correlation 
(r=-. 375,  p> . 10)  between  the  weighted  ratings  of  difficulty  and  the  SQT2 
results  suggests  that  the  squad  leaders  and  platoon  leaders  may  have  been 
guessing  at  the  task  difficulty  without  relying  on  some  common  conception 
of  jus^  what  constitutes  task  difficulty.  This  is,  of  course,  dependent 
upon  whether  or  not  the  written  component  of  the  SQT  is  representative  of 
actual  task  performance.  The  SQT  is  criterion  referenced  and  validated 
against  actual  performance,  as  previously  mentioned.  The  high  correlation 
(r=+.854,  p<.001)  between  two  administrations  of  the  SQT,  as  well  as  the 
high  correlation  (r=+.774,  p<.02)  between  SQT2  and  SQT4,  lends  support 
to  this  idea  and  indicates  that  is  is  unlikely  that  the  earlier  SQT  was 
unrepresentative  of  SQTs  in  general. 

The  possibility  exists  that  the  criteria  for  task  difficulty  utilized 
by  the  squad  leaders  and  platoon  leaders  does  not  apply  to  or  is  not  rep¬ 
resented  by  the  SQT2.  The  weighted  rating  of  task  difficulty  might  then 
be  expected  not  to  correlate  with  the  SQT2  results,  but  to  be  representative 
of  difficulty  with  respect  to  SQT4  only.  The  low  correlation  (r=-.264, 
p>.10)  between  SQT4  and  weighted  rating  of  difficulty  suggests  that  this  is 
not  the  case. 

The  negative,  though  insignificant,  correlation  between  the  ratings  of 
task  difficulty  and  SQT  results  for  both  inexperienced  soldiers  (SQT2 ,  r=-.375) 
and  experienced  soldiers  (SQT4,  r=-.264)  is  interesting,  and  is  suggestive 


6 


of  a  trend.  This  apparent  trend  should  only  be  attributed  to  a  randan 
variance  due  to  the  nonsignificant  r ,  but  there  may  actually  be  a  bias 
in  the  ratings  that  result  in  the  negative  correlation.  The  most  likely 
reason  for  a  trend,  if  such  a  trend  exists,  is  that  the  raters  tended  to 
rate  all  tasks  as  easy  regardless  of  the  relative  difficulty.  The  fact 
that  many  of  the  difficult  tasks  were  rated  as  being  easy  suggests  a 
lack  of  familiarity  with  difficult  tasks. 

The  results  do  not  imply  that  expert  ratings  cannot  be  used  as 
estimates  of  task  difficulty,  but  only  that  new  methods  must  be  devised 
to  insure  the  use  of  relevant  criteria  for  task  difficulty  across  raters. 
Crocker  et  al.  (1977)  suggests  that  one  of  the  major  sources  of  judgemental 
uncertainty  is  a  random  and  unpredictable  environment  which  produces 
random  and  unpredictable  information,  making  it  difficult  for  a  judge  to 
discern  an  information  pattern.  This  may  be  the  case  in  the  turbulent 
military  environment.  One  possible  approach  to  future  ratings  of  task 
difficulty  is  to  define  for  the  rater  just  what  factors  or  elements 
contribute  to  task  difficulty.  However,  there  is  a  problem  in  defining 
the  components  of  task  difficulty  for  the  rater.  Task  difficulty  can  be 
described  in  terms  of  task  complexity,  and  can  be  resolved  into  functional 
and  process  components.  Functional  complexity  describes  the  number  of  sub¬ 
tasks  comprising  a  task  and  process  complexity  the  information  processing 
requirements  of  a  subtask  (Teichner,  1974) .  The  relevant  components  of  the 
task  described  to  the  rater  are  inheretly  dependent  upon  the  task  taxonomy 
used  to  provide  the  terminology,  and  the  level  of  detail  required  to  enable 
the  rater  to  discriminate  between  tasks  along  the  dimension  of  difficulty. 
Judgemental  biases  of  multidimensional  complexity  descriptions  could  possibly 
be  reduced  through  the  techniques  of  sensitivity  analysis  (Fischhoff  et  al. , 
1978) . 
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3  Chief.  Canadian  Def  Rsch  Staff,  ATTN:  C/CRDSfW) 

4  British  Otf  Staff,  British  Embassy,  Washington 


1  Def  &  Civil  Inst  of  Enviro  Medicine,  Canada 
1  AIR  CRESS.  Kensington,  ATTN:  Info  Sys  Br 
1  Militaarptykotogitk  Tjeneste,  Copenhagen 
1  Military  Attache,  French  Embassy.  ATTN:  Doc  Sac 
1  MedecinChef.C.E  R.P  A.-Arsenal,  Toulon/Navel  France 
1  Prin  Scientific  Off,  Appt  Hum  Enp  Rsch  Div.  Ministry 
of  Defense,  New  Delhi 

1  Pert  Rsch  Ofc  Library,  AKA,  Israel  Defense  Forces 
1  Ministeris  van  Oefensie,  DOOP/KL  Afd  Sociaal 
Piychologtsch*  Taken,  The  Hague,  Netherlands 


10 


